-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DcmList
random access be O(1)
#113
Conversation
@@ -254,23 +165,11 @@ DcmObject *DcmList::seek( E_ListPos pos ) | |||
|
|||
DcmObject *DcmList::seek_to(unsigned long absolute_position) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this O(n) search to an O(1) lookup is the beef of this PR.
…profiling suggested that some 50% of the time (*) was spent in `DcmList::seek_to` (which has a random-access-like interface, but internally _iterates_ over a doubly linked list) Solution: Internally use `std::deque` that provides fast O(1) random access (*): the other 50% was spent decoding jpeg tiles, which is legit use of time of course
10498d1
to
738ff84
Compare
Hi, thank you. This looks like an interesting patch (I did not try it yet); I have not been aware of std::deque which seems to be a good match for the given purpose. One "obvious" problem is that we still support compiling DCMTK with C++98 compilers (while using C++11 as the default). We might drop C++98 support in the future but right now we would need to make those guys happy as well. In the past we wrote our own classes (in ofstd, like OFVector, OFList, OFTuple etc.) as drop-in replacements for std lib features but that's probably too much work for std::deque also before the background that we might not need it any more in a few years. Another way to go is to only use the new behavior if C++11 or greater is available and otherwise fall back to the old implementation. What do you think? |
Thank you for your answer. I think So, I think two options here:
|
@@ -101,13 +61,13 @@ class DCMTK_DCMDATA_EXPORT DcmList | |||
/// destructor | |||
~DcmList(); | |||
|
|||
/** insert object at end of list | |||
/** insert object at end of list and make it be the "current" one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the existing behavior for clarity (might be relevant on the call site).
I just read on cppreference.com: "Insertion or removal of elements - linear O(n)." |
@jriesmeier Insertion or removal at the end is O(1) which is the standard case when reading files. |
@michaelonken That's true, but reading DICOM files/datasets is not the only use case. I would rather suggest to check where @reunanen had performance problems with the existing implementation and try to enhance this particular use case. |
Yes, that makes sense. I can try and see if I could easily change the higher-level implementation to use something that might be better suited for the task perhaps. Unfortunately I don't have permission to share the example image, but I don't think it's extremely atypical or anything. For what it's worth: I started looking into this whole thing because a client (a hospital) mentioned that reading the exact same image pixels from a DICOM file was some 2x slower than reading in a corresponding Aperio .svs file. The example file is a whole-slide pathology image that is about 1 gigabyte in size. While the image itself does not really look unusual to me, I'm not sure if the DICOM representation is "typical", at least just yet. (I'm certainly not a DICOM expert.) |
I am aware of the DICOM representation of Whole Slide Microscopy Images (WSI). In fact, some of the performance improvements I referred to in my previous comment had been motivated by these huge WSI objects. Processing DICOM SOP Instances with 1 GBytes in size or even more than 100 GBytes in size is no problem at all if you know what you are doing (e.g., I implemented a software for this purpose based on the DCMTK for one of my customers). Iterating over the index of a data structure that is based on DcmList is certainly not a good idea, but I agree that this could (and should?) be improved. |
I know, however, I thought O(n) is also be the complexity for random insertion (or access in general) in DcmList. |
I have an idea for an alternative implementation; let me try and see if I can get a similar performance improvement in my use case, without sacrificing the performance of inserting in the middle of the list (which is indeed O(1) at the moment). I'll get back to you guys when I've been able to try it out. |
Closing in favor of #114. |
Problem: Reading a real-world DICOM image was a bit slow, and simple profiling suggested that some 50% of the time (*) was spent in
DcmList::seek_to
(which has a random-access-like interface, but internally iterates over a doubly linked list)Solution: Internally use
std::deque
that provides fast O(1) random access(*): the other 50% was spent decoding jpeg tiles, which is legit use of time of course